Acute Myeloid Leukemia (AML) is a biologically heterogeneous and clinically aggressive malignancy characterized by poor survival and high relapse rates. Although cytogenetic and mutational profiling have advanced risk stratification, they fall short in capturing the complex transcriptional and post-transcriptional regulatory mechanisms underlying disease progression. Despite the identification of some gene and miRNA biomarkers in AML, many potentially informative candidates remain unexplored in terms of their predictive relevance. Integrative multi-omics approaches, especially those combining gene and miRNA expression, offer powerful means to uncover critical regulatory networks and enhance molecular classification, prognostic modeling, and therapeutic development in AML, with model evaluation performed using the European LeukemiaNet (ELN) risk classification, a clinically validated standard for AML risk stratification based on cytogenetic features.

In this study, we performed a comprehensive multi-cohort analysis of gene and miRNA expression across two independent AML datasets: The Cancer Genome Atlas (TCGA) and Beat AML (BEATAML) for gene expression, and TCGA and Genomics of Acute Myeloid Leukemia (GAML) for miRNA expression to discover AML risk-specific biomarkers. We applied Principal Component Analysis (PCA) to capture significant expression variation and reduce dimensionality, advancing beyond traditional single-gene models, followed by survival-based feature selection and Support Vector Machine (SVM) classifiers to validate the predictive power of predicting the three ELN categories using the identified features. We integrated expression correlation analysis with experimentally validated miRNA–gene interactions from TarBase, to elucidate regulatory mechanisms.

We identified 350 and 786 AML-associated genes from PCA analysis in the TCGA and BEATAML cohorts, respectively, with 31 genes consistently associated with patient survival across both cohorts. These included known AML-associated genes such as CTGF, TAL1, and FHL2, validating the reliability of our feature selection strategy. We also identified novel genes (e.g., MRPL16, CCDC57, and NAGLU) that contributed substantially to classifying the patients into three ELN categories. Classifiers trained exclusively on novel genes achieved accuracy in terms of classifying the patients into ELN categories comparable to those trained on known genes (AUC = 0.856 vs. 0.853). Combining top-ranked known and novel genes yielded the highest predictive accuracy (AUC = 0.919), underscoring the complementary value of novel biomarkers in AML.

Parallel analysis of miRNA expression identified 248 and 77 AML significantly associated miRNAs in TCGA and GAML, respectively, with 37 shared across cohorts. Key features included both known AML-associated miRNAs (e.g., hsa-miR-181a-3p, hsa-miR-142-3p, and hsa-miR-26a-5p) and novel candidates such as hsa-miR-3613-5p, hsa-miR-942-5p, and hsa-miR-618. Classifiers based solely on novel miRNAs achieved strong predictive performance (AUC ≥ 0.89), while those using top-ranked miRNAs attained the highest accuracy (AUC = 0.923), highlighting the diagnostic potential of underexplored miRNA features.

To investigate regulatory interactions, we analyzed expression correlations (Spearman ρ = 0.25–0.30, p < 0.05) among significant miRNA–gene pairs from TarBase, an experimentally validated miRNA target database. We identified coherent pairs, including C1QBP: hsa-miR-942-5p and FHL2: hsa-miR-3613-5p. C1QBP, a mitochondrial protein, and FHL2, a LIM-domain transcriptional regulator, are both known oncogenes implicated in AML. Their expression increased from favorable to adverse ELN risk groups, while the corresponding miRNAs showed inverse trends, suggesting miRNAs interact with their target, jointly contributing to AML pathogenesis.

In conclusion, our integrative multi-omics framework identified novel genes, miRNAs, and regulatory interactions that are biologically meaningful and clinically pertinent for AML. Integrating machine learning, survival-informed feature selection, and experimentally confirmed interactions, we identified robust biomarkers with strong predictive and mechanistic potential. The convergence of gene and miRNA signatures across independent cohorts provides a biologically grounded strategy for refining AML risk stratification and informing therapeutic development.

Acknowledgment: Funded by ACS IRG #22-151-37-IRG and the MCW Cancer Center.

This content is only available as a PDF.
Sign in via your Institution